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two means: Power consideration 
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The a priori determination of a proper sample size necessary to achieve 
some specified power is an important problem encountered frequently in 
practical studies. To establish the needed sample size for a two-sample t test, 
researchers may conduct the power analysis by specifying scientifically 
important values as the underlying population means while using a variance 
estimate obtained from related research or pilot study. In order to take 
account of the variability of sample variance, this article considers two 
approaches to sample size determinations. One provides the sample size 
required to guarantee with a given assurance probability that the actual 
power exceeds the planned power. The other gives the necessary sample size 
such that the expected power attains the designated power level. The 
suggested paradigm of adjusted sample variance combines the existing 
procedures into one unified framework. Numerical results are presented to 
illustrate the usefulness and advantages of the proposed approaches that 
accommodate the stochastic nature of the sample variance. More 
importantly, supplementary computer programs are developed to aid the 
usefulness and implementation of the suggested techniques. The exposition 
helps to clarify discrepancy in the previous demonstration and to extend the 
development of sample size methodology. 


When designing research studies, the determination of sample size is 
an essential process in order to ensure there is adequate statistical power to 
detect scientifically credible effects. To make inferences about differences 
between two normal population means, the hypothesis testing procedure 
and corresponding sample size formula are well known and easy to apply. 
Specifically, Kupper and Hafner (1989) demonstrated that the particular 
procedure that considers statistical power performs amazingly well even for 
very small sample sizes. Also, recent discussions of related sample size 
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issues can be found in Guo and Luh (2011), Jan and Shieh (2011), and 
Lawson and Fisher (2011). For important general procedures, see the 
comprehensive treatment in Cohen (1988), Desu and Raghavarao (1990), 
Kraemer and Thieman (1987), Murphy and Myors (2004), and Odeh and 
Fox (1991). Due to the prospective nature of advance research planning, it 
is difficult to assess the adequacy of selected configurations for model 
parameters including two population means and common error variance. 
However, the general guideline suggests that typical sources like previously 
published research and successful pilot studies can offer plausible and 
reasonable planning values for the vital model characteristics (Thabane et 
al. 2010). Nonetheless, it is good practice to consider a range of design 
variations to provide guidance about the achieved power levels and required 
sample sizes for the study. 

In view of the imprecision of parameter settings, Shiftier and 
Harwood (1985) investigated the effect on the realized a-risk of using a 
pilot sample variance to estimate variance on the sample size formula for 
testing population means. Their results suggested that researchers should 
recognize that the actual type I error rates may be substantially different 
than the nominal level. Also, Browne (1995) examined the deficiency of 
using a sample variance to compute the sample size needed to achieve the 
planned power for one- and two-sample /-tests. The empirical results of 
Browne showed that the actual power attained with the calculated sample 
size is quite likely to be less than the planned power. More importantly, he 
proposed to improve the underpowered condition by using the upper 
confidence limit of the sample variance rather than the sample variance 
estimate itself. The imputation of selected upper confidence limit for 
population variance in the sample size calculation can guarantee that the 
actual power will exceed the planned power with designated probability. In 
practice, the actual sample size required to guarantee the assurance 
probability with respect to the planned power is closely related to the 
underlying distributional property and magnitude of sample variance 
estimate from a pilot sample or related investigation. 

On the other hand, Kieser and Wassmer (1996), and Julious and 
Owen (2006) suggested an alternative approach to accommodate the 
variability of sample variance for sample size determination. Specifically, 
Kieser and Wassmer investigated the expected power performance of the 
adjusted sample variance method of Browne (1995). While Julious and 
Owen used the unadjusted sample variance to compute the necessary 
sample size so that the expected power of a two-sample /-test will meet the 
planned power. Just as in the case of power assurance probability 
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consideration of Browne, the numerical illustration with respect to expected 
power showed that the sample sizes provided by the traditional formulas are 
too small since they neglect the imprecise nature of a variance estimate. 
Thus this may lead to a distorted power performance and unsatisfactory 
research outcome for the planned study. Notably, in his review of 30 
clinical trials published along with their pilot data in top medical journals 
Vickers (2003) found 80% of the studies are underpowered. It should be 
clear that the insufficient statistical power phenomenon does not occur 
exclusively in only one particular scientific discipline. Also, Kraemer, 
Mintz, Noda, Tinklenberg and Yesavage (2006) cautioned the use of pilot 
studies to perform power calculations for study proposals. It seems prudent, 
therefore, to emphasize that the embedded properties of the sample size 
procedures should be well understood before they are adopted by 
researchers in performing sample size evaluation. 

The distinct advantage of the prescribed methods is that it 
circumvents the uncertainty of sample variance by taking account of the 
underlying chi-square distribution of sample variance and permits a 
corrected sample size determination according to the desired assurance 
probability and expected power considerations. The two criteria of 
assurance probability and expected power are fundamentally different and 
provide potentially useful tool in finding sample size with good properties. 
However, the theoretical presentations and algebraic expressions in Browne 
(1995), Kieser and Wassmer (1996), and Julious and Owen (2006) appear to 
be diverse and incomplete. Two important aspects of these results should be 
pointed out. 

First, Kieser and Wassmer (1996) did not specifically address the 
issue of how to modify the sample variance in sample size calculations so 
that the expected power of a two-sample /-test will attain the planned 
power. They only evaluated the expected power of the two-sample t test 
with sample sizes that are required to satisfy the selected assurance level of 
power. Second, although Julious and Owen (2006) studied the prescribed 
problem of determining the sample size under the notion of expected power, 
their analytical exposition for the overall or expected power function is 
complicated and does not conform to the adjusted sample variance 
approach. Consequently, the suggested sample size formula appears to be 
opaque and the procedure may be of less practical value in application. For 
pedagogical importance and practical interest, one must have a thorough 
understanding of the fundamental details of the sample size methodology 
before the technique is finally considered to be appropriate for making 
sound application. 
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In addition to the abovementioned natural formulation, a Bayesian 
analysis is also viable. For example, O’Hagan, Stevens and Campbell 
(2005) proposed to incorporate a lognormal prior distribution for the 
unknown variance and computed the necessary sample size to achieve a 
desired expected power. However, the hyper-parameters of the prior 
distribution still have to be imputed and are usually elicited from reliable 
sources such as the knowledge of experts. On the other hand, Gillett (2001) 
described a Bayesian approach to calculate the sample size for a designated 
level of expected power with respect to the posterior distribution of 
probable effect sizes. The uncertainty in effect size is a combination of t 
value and sample size from a previous study, and a normal prior distribution 
with specified hyper-parameter values. 

Essentially, the Bayesian approach is more sophisticated than the 
standard setup described above. It is conceivable that the accuracy of the 
procedures depends mostly on the appropriateness of prior distributions 
which often accompanies subjective hyper-parameter values. The interested 
reader is referred to Gillett (2001), O’Hagan et al. (2005) and the references 
therein for further details. Here we focus on the problem of dealing with the 
uncertainty inherent in a sample variance estimate under the ultimate notion 
of choosing a profound adjusted factor to provide sufficient sample sizes 
with supplied power strength for subsequent main study. Although the 
discussion concentrated on testing the equality of two group means, the 
principles and procedures are also applicable in more complicated models 
and it embodies all the critical issues without the distracting complications 
of the multiple treatment groups and more sophisticated statistical models. 

In order to improve the quality of research findings, this article 
attempts to contribute to the derivation and evaluation of sample size 
methodology for two-sample t tests in two important and distinctive aspects. 
First, we present a simplified and constructive approach to sample size 
determination for the two-sample problem which computes the sample size 
required to guarantee that the actual power exceeds the planned power with 
a given assurance probability. Second, we reexamine the expected power 
approach to sample size determination through rigorous analytical 
presentations and numerical assessments. 

The general formulation described in this article combines the 
existing procedures into one unified framework. In the process, we attempt 
to provide a clear and concise exposition of the fundamental theoretical 
arguments, and conduct exact empirical investigations to demonstrate the 
potential deficiency of existing findings. Thus, a well-supported and useful 
recommendation can be offered for empirical studies. Numerical results are 
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provided for a variety of situations to demonstrate the individual impact of 
deterministic factors and how they work as whole pertaining to the two 
different power considerations. Furthermore, a numerical example is 
presented to illustrate the usefulness and advantage of the proposed 
methods that account for the embedded randomness and distributional 
characteristic of the sample variance. For practical purposes, the computer 
codes are developed to facilitate the recommended procedures for 
computing the necessary sample size in planning research designs. 

Fundamental Methodology 

Consider independent random samples from two normal populations 
with the following formulations: 

X u = u, + s i; and X 2j = q 2 + e 2/ , (1) 

where u, and u 2 are unknown parameters, and s 1; and s 2j are iid N ((), cr) 
random variables, i = l, and j = 1,..., N 2 . For the purpose of detecting 
the group effect in terms of the hypothesis H 0 : u, = u 2 versus H,: u, r u 2 , 
the well-known test statistic t under the model formulation in Equation 1 is 
of the form 


where 


_* 1 - *2 _ 

{$1 INx + 1 /N 2 )} v2 


( 2 ) 


X, = l XJN U X 2 = f X2JN2, Sl = SSE/(N\ + N 2 - 2 ) 

f=i /-i 

is the usual unbiased estimator of cr, and 

SSE = l (X x , - X ,) 2 + | (X v - x 2 f. 

.=1 j - 1 

For ease of exposition, the two sample sizes N x and N 2 are expressed as N x = 
N and N 2 = rN x , respectively, where r > 0 is a constant. If the null 
hypothesis H 0 : u, = u 2 is true, then the statistic t is distributed as /(v), a 
central t distribution with v = N x + N 2 - 2 = N(\ + r) - 2 degrees of 
freedom; and H 0 is rejected at the significance level a if Id > t , where 
f is the upper 100(a/2)th percentile of the t distribution /(v). Under the 
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alternative hypothesis, t has a noncentral t distribution t(y, 6) with v degrees 
of freedom and noncentrality parameter 6 

t~t(y,b), (3) 

where 6 = d/(ccr/N)' 12 , c/ = u, - u 2 , and c = 1 + Hr. With the distribution in 
Equation 3, the associated power function is denoted by 

cr) = P{\t(v, 6)1 > t }. (4) 

v,a/2 

For the purpose of sample size determination, the power functions defined 
in Equation 4 can be employed to calculate the sample size N, needed to 
attain the specified power 1 - (3 for the chosen significance level a and 
parameter values (ii,, u 2 , cr). Ultimately, the resulting sample size N, is the 
least integer that ensures n(N„ cr) = P{\t(y t , 6,)l > t } > 1 - 13, where v, = 

l v t ,al2 

N,(l+r)-2 and 6, = d/(ccr/N,) m . 

Although the preceding sample size determination is well 
documented, this article considers the practical scenario that the variance 
parameter is estimated with a sample value available from published 
research or a preliminary study. The sample size calculations under this 
circumstance directly follow the prescribed procedure with the obvious 
substitution of cr with a sample variance estimate, denoted by a p . In this 
case, the adapted power function is 

x(N, 8$) = P{\t(v, 5 p )l > t }, (5) 

where 8 p = dl(cd p IN) m . Accordingly, the required sample size N p is the 
least integer that guarantees n(N p , a p )) > 1 - (3. Clearly, the calculations for 
sample size N p are straightforward to apply and do not incur any extra 
complexity. However, the reported sample size N p tends to be 
underestimated (N p < N,). Consequently, a test using N p as the sample size is 
more likely to produce an actual power smaller than the planned power, i.e., 
n(N p ,cr) < 1 - (3. The explanation is provided next to emphasize the 
importance for further investigation and improvement. 

Regarding the underlying randomness of the sample variance 8 p , it is 
commonly assumed that it has the following distribution 

~ cry;(v), (6) 

where x 2 C ! ) is a chi-square distribution with o degrees of freedom. It is well 
known that 8 p is an unbiased estimator of cr, however, the right-skewness 
of the chi-square distribution gives P{8 p < cr} = P(x 2 (r>) < n} > 0.5. 
Hence, an observed sample variance 8 p < cf occurs more often than > 
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cr. With the designated mean difference d, significance level a, power level 
1 - (3, and selected sample sizes N p and N„ both power functions basically 
give the identical value n(N p ,<r p ) = n(N„ cr) — 1 - (3. Note that the generic 
power function in Equation 4 is a monotone function of noncentrality 6 for 
fixed degrees of freedom v, and that the discrepancy between the power 
values computed with different degrees of freedom is comparatively 
negligible. Thus, the identity n(N p ,d p ) — n(N„ cr) implies the approximate 
equivalence between the two noncentrality parameters or d p /N p = cr/N t . 
Moreover, it leads to N p < N, and n(N p , cr) < n(N„ cr) — 1 - (3 if 8 p < cr. 
Hence, the calculated sample size N p based on observed sample variance 
<7p has a high probability of giving insufficient power because P{d p < cr} > 
0.5. This phenomenon has been demonstrated in the numerical presentations 
of Browne (1995). 

In view of the deficiency in the sample size calculation associated 
with power formula in Equation 5, it is of both theoretical interest and 
practical importance to consider alternative procedures. We apply the 
general idea in Browne (1995) in the following two different approaches to 
sample size determination with the use of an adjusted form of pilot sample 
variance = cra p , where a is a constant to be chosen. With the use of a 
multiple of sample variance in place of d p in Equation 5, the power 
function for sample size calculation is modified as 

n(N,dl) = P{\t(y,8 a )\> t v , a/2 >, (7) 

where 8 a = dKc'dfjN)' 2 . Accordingly, the required sample size N a is the 
minimum sample size so that n(N a , o^) > 1 - (3 and the actual power is the 
value of n(N a , cr). 

To account for the stochastic nature of pilot sample variance in 
sample size determination, the overall or unconditional considerations of 
assurance probability and expected power are presented next. It is 
noteworthy that the two principles of assurance probability and expected 
width are closely related to the two standard criteria of consistency and 
unbiasedness in statistical point estimation, respectively. In other words, 
these two measures impose unique and distinct aspects of desirable 
characteristics on the resulting power behavior, and each principle has 
conceptual and empirical implications in its own right. 
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Assurance Probability Approach 

Since d g is a random variable with the scaled chi-square distribution 
given in Equation 6, the power function defined in Equation 7 is also a 
random variable. Accordingly, the power function of Equation 7 based on 
an observed sample variance d g in d} L = a-dp can be viewed as an estimate 
because the sample variance estimate dp represents a realization of all 
possible outcomes. It is constructive to ensure the actual power is at least as 
large as the planned power with a designated assurance probability. 

Specifically, given mean difference d, significance level a, planned 
power 1 - (3 and assurance probability 1 - y, this procedure purports to find 
the proper correction factor a so that it satisfies the equality 

reu, < 7 a 2 ) = P{x(N a , a 2 ) > 1 - (3} = 1 - Y . (8) 

It is important to note that the assurance probability r(u, d%) depends 
on the unknown parameter cr and cannot be evaluated in practice. However, 
a useful and accurate approximation can be obtained. Following an 
argument similar to that employed above for the direct use of uncorrected 
sample variance, it is clear that n(N a , <r 2 ) = n(N„ cr 2 ) — 1 — (3 and 
da/N a = cr/N,. Thus, 

P{K(N a , a 2 ) > 1 - |3} = P{x(N a , a 2 ) > n(N„ a 2 )} = 

= P(N a >N t )=P(N a >o l ). (9) 

With the specified distribution of sample variance given in Equation 
6, the assurance probability r(o, 8%) can be well approximated by 

r(o,<7 2 )= P(K>v/a), (10) 

where K = v8p/o 2 ~ x 2 (W- It is important to note that the approximation is 
independent of the parameter values of mean difference d and error variance 
cr. For the selected assurance level of 1 - y, the assurance probability 
T(o, 8%) — 1 - y if a = g where 

g = vlxi,y ( 11 ) 

and Xv,y is the 100-yth percentile of a chi-square distribution with o degrees 
of freedom. 

For the purpose of sample size determination, the modified power 
function given in Equation 7 with a = g can be employed to calculate the 
sample size N g needed to test hypothesis H 0 : li, = u 2 versus H,: u, ^ u 2 in 
order to attain the specified assurance probability (1 - y) for planned power 
(1 - (3) with significance level a and mean difference d. Accordingly, the 
computed sample size N g varies with the actual value of the pilot sample 
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variance from one application to another. Let z a / 2 and Zp denote the upper 
100(a/2)th and 100-(3th percentiles of the standard normal distribution, 
respectively. Since the sample size N g conditional on d g = g-d g is 
approximately equivalent to N g = cd g (z a / 2 + Zpf/d 2 , the unconditional or 
expected sample size N(v , 6 g ) is N(v,a g ) = E{N g } = E{cd g (z a/2 + 
zpf/d 2 } = cg-tr(z a/2 + zpf/cl 1 . 

Similarly, the expected sample size corresponding to the usual 
procedure without sample variance correction is N(v,ffp) = E {N p ) = 
E{cdp ( z a j 2 + Zpf/d 2 } = c-cr(z a / 2 + Zp) 2 /d 2 . Obviously, N p < N (J and 
N(v,a g ) < N(v,a g ) if g > 1. Note that the approximation in Equation 10 
was also presented in Kieser and Wassmer (1996) to justify the simulation 
results in Browne (1995). However, a close examination shows the 
theoretical justification in Kieser and Wassmer (1996) involves some minor 
and unstated simplifications about the interchange of degrees of freedom 
v g = N g (l + r) - 2 and v, = 7V,( 1 + r) - 2 in our notation. This is the key 
argument that prevents a direct proof of N, < N g in the foregoing exposition 
and should be explicitly stated. 

In order to examine the performance of the prescribed approximate 
method for sample size determinations, numerical investigations are 
performed for the detection of mean difference. The actual computation of 
the exact assurance probability T(o, a g ) in Equation 8 requires the 
evaluation of noncentral t cumulative distribution function and the one¬ 
dimensional integration with respect to a chi-square distribution. 
Specifically, for each possible value of the sample variance with a 
scaled chi-square distribution, the least sample size N g is computed so that 
the corresponding power n(N g , ?r g ) > 1 - (3, where a g = g-a g and g is 
calculated as in Equation 11. Also, the actual power n(N g , cr) obtained with 
sample size N g is evaluated. Then the exact assurance probability is the 
expected value T(o, 'd g ) = Ek\ KN g , cr)\, where £>[•] denotes the 
expectation taken with respect to the chi-square distribution of K ~ X 2 (v) 
given in Equation 10, and I(v, a g ) = 1 if rc(N g , o 2 ) > 1 - (3 and I(v, a g ) = 0 
if rc(N g , o 2 ) < 1 - (3. Likewise, the exact expected sample size is the 
expected value N(v, a g ) = Ek\ N g \. Moreover, the exact values of T(o, 
and A(o,ffp) can be obtained with the same computing procedures. It 
should be noted that neither Browne (1995) nor Kieser and Wassmer (1996) 
conducted exact computations of assurance probability. Only simulation 
results were reported in Browne (1995). 
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To facilitate the application of sample size planning under assurance 
probability consideration, selected comparisons of exact and approximate 
results associated with and without sample variance modification were 
performed. The adequacy of the approximate procedure is determined by 
the error = exact assurance probability - approximate assurance probability. 
For the chosen model configurations of N x = N 2 , cr = 1, planned power (1 - 
(3) = 0.90, assurance probability (1 - y) = 0.80, and a = 0.05, the assurance 
probabilities T(o, 'dp) and T(o, 'dg) for sample variance degrees of 
freedom v = 10, 50, 100, and 500 are listed in Tables 1-3 for d = 0.25, 0.50 
and 1, respectively. In addition, the corresponding exact and approximate 
magnitudes of expected sample sizes N(v,7jp) and N(v, 'dg) are also 
presented. The correction multiplier g in 'dg is g = v/Xv,o. 2 o = 1-6184, 
1.2063, 1.1371 and 1.0566 for v = 10, 50, 100, and 500, respectively. 

Examination of the exact values of T(o, 'dp ) and T(o, 'dg) in Tables 
1-3 confirms the undesired behavior of the usual procedure by directly 
substituting sample variance 'dp for cr in sample size calculation. 
Specifically, the values of T(o, 'dp) in Tables 1-3 fall within the interval of 
0.4408 and 0.4971, which appears to be too low to be satisfactory in power 
assurance. On the contrary, the results correspond to the case of corrected 
counterpart 'dg are nearly equivalent to the designated level of 0.80. 
Certainly, the discrepancy in assurance probability T(o, 'dp) and T(o, 'dg) 
simply reflects the substantial difference between the expected sample sizes 
of N(v , 'dp) and N(v,ag). Regarding the performance of suggested 
approximate formulas, there is a close agreement between the exact and 
approximate values of expected sample size and assurance probability. The 
largest absolute error of expected sample size is 1.52 while the maximum 
absolute error of assurance probability is 0.0055. According to these 
findings, the performance of the approximate method given in Equation 10 
with a = g seems to be reasonably good for the range of model 
specifications considered here. Therefore, the required sample size to ensure 
sufficient assurance probability can be computed with the adapted 
procedure with adjusted sample variance 'dg so that the possible low 
statistical power problem for detecting mean difference can be recognized 
in advance. 
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Expected Power Approach 

Instead of assurance probability appraisal, an alternative criterion for 
sample size determination is agreement of expected actual power with 
planned power. Following the underlying assumption of sample variance 
defined in Equation 6, we continue the adjusted sample variance strategy in 
order to provide a unified framework for both the assurance probability and 
expected power considerations. 

Accordingly, given mean difference d, significance level a, and 
planned power 1 - (3, it is desired to find the proper adjusted factor a so that 
the equality is fulfilled 

n(u, al) = E K [x(N a , o 2 )] = 1 — |3. (12) 

where £k[-] denotes the expectation taken with respect to the chi-square 
distribution of K ~ % 2 (v) given in Equation 10. It is obvious that the 
evaluation of expected power Il(o, a£) in Equation 12 requires the 
specification of presumably unknown parameter cr. A feasible 
approximation is presented in the following. 

Note that n(N a , cr) — P{\Z + 8 a \ > z a j 2 } for moderate sample size 
N a where Z has a standard normal distribution and 8 a = d/(ccr/N a ) 112 . Since 
Na is the minimum sample size so that n(N a , o^) > 1 - (3, it follows that 
N a ~ c°a(z a /2 + z p fld 2 and 8 a = a m {z a/2 + z p )(Klv)' 12 . Hence the 
actual power conditional on K is 

Tt(jV a , cr) = P{(Z + z a i 2 )/(K/v) 112 < -a m (z a/2 +z p )} + 

P{(Z + z a/2 )/(K/v) m < a m (z a/2 +z p )}. (13) 

It follows from the definition of a noncentral t distribution (Rencher, 
2000, p. 102) that the suggest approximation to the expected power is of the 
form 

n(o, ol) = P{T < -a m (z a/2 +z p )} + P{T < a m (z a/2 + z p )}, (14) 
where T ~ t(v,z a / 2 ). An important aspect of the simplified expression is 
that it does not depend on the unknown variance cr. Since the cumulative 
distribution function of a noncentral t distribution is readily embedded in 
modem statistical packages such as the SAS system, a standard iterative 
search can be conducted to find the correction factor a = h so that 

P{T < -h m (z a/2 +z p )} + P{T < h m (z a/2 + zp )} = 1 - |3. (15) 

The search can be simplified with P{T < h lll (z a / 2 + Zn )} = 1 - (3 
because P{T < -h' l2 (z a / 2 + z p )} is generally negligible for large power 
(1 - (3). Although similar formula of expected power was described in 
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Equation 5 of Kieser and Wassmer (1996), they did not specifically address 
the issue of how to modify the sample variance in sample size calculations 
so that the expected power of a two-sample /-test will attain the planned 
power. In fact, their presentation focused on the expected power 
calculations of the two-sample t test with sample sizes that are required to 
satisfy the selected assurance level of power. 

For the practical purpose of sample size determination, the modified 
power function given in Equation 7 with a = h can be employed to calculate 
the sample size N h needed to test hypothesis H 0 : u, = u, versus H,: li, ^ u, 
in order to attain the expected power (1 - (3) with the chosen significance 
level a and mean difference d. In this case, since the sample size N h 
conditional on 'of = h- 'of is approximately equivalent to N h — c of (z a / 2 + 
Zp f/d 2 , the unconditional or expected sample size N(v.'df) is Niv^f) = 
E{N h } = E{cof(z a / 2 + Zp f/d 1 } = ch-o 2 (z a/2 + Zp) 2 /d 2 . As described 
earlier, the expected sample size corresponding to the usual procedure 
without sample variance correction is Nfv^f) — c-o 2 (z a / 2 + Zp fid 1 . 
Hence, N p < N h and N(v , 'of) < if h > 1. 

To enhance the applicability of the proposed approximate procedure 
for sample size calculations, the accuracy of correction factor h is 
demonstrated with the differences between the exact and approximate 
expected power. Due to the complexity in the computation of the exact 
expected power, a computing algorithm similar to the one in the assurance 
probability approach is developed. As in the empirical investigation of 
assurance probability, selected comparisons of exact and approximate 
results associated with and without sample variance modification were 
performed. Likewise, the adequacy of the approximate procedure is 
determined by the error = exact expected power - approximate expected 
power. For the chosen model configurations of N x = N 2 , cr = 1, planned 
power (1 - (3) = 0.90, and a = 0.05, the expected power n(u, 'of) and 
n(u, of) for sample variance degrees of freedom u = 10, 50, 100, and 500 
are listed in Tables 1-3 for d = 0.25, 0.50 and 1, respectively. For ease of 
reference, we also summarize the corresponding exact and approximate 
results of expected sample sizes N(v , Tff) and N(v,of). The computed 
correction value h in of is h = 1.3005, 1.0531, 1.0262 and 1.0052 for o = 
10, 50, 100, and 500, respectively. 

As can be seen from the results in Tables 1-3, it should not be 
surprising in view of the exact values that the expected sample size N(v,of) 
< N(v,of) and expected power n(u, of) < n(u, 'of) because the reported 
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correction h > 1. Hence an adjusted sample variance is required to perform 
sample size calculation for the designated expected power, especially the 
degrees of freedom o of the pilot sample variance is small. For the power 
level 1 - (3 = 0.90, the exact and approximated expected power n(o, a%) 
are almost identical with the largest absolute error 0.0065 for the case of o 
= 500 in Table 3. The errors associated with expected power n(o, &£) are 
slightly larger than those of n(u, er^), but they are less than 0.01 and are 
generally acceptable. Therefore the accurate approximate procedure given 
in Equation 14 with a = h or Equation 15 can be employed to calculate the 
sample size needed for the chosen level of expected power. Without the 
modification of a pilot sample variance, the resulting sample size and 
expected power may be too small to be satisfactory. For completeness, the 
expected power n(o, 'dg) and assurance probability T(o, a%) are also 
presented in Tables 1-3. The contrasting values of T(o, 'dg), n(o, 'dg), 
T(o, a%), ntu, dp) reveal that the difference between T(o, 'dg) and T(o, 
Tjg) is greater than that between n (v , dg) and ntu, dp). In other words, 
the change of adjusted factor from g to h (g > h) incurs more substantial 
decline in assurance probability than expected power. Hence, the assurance 
probability criterion is more sensitive to the adjustment of sample variance. 
Although the variance parameter of pilot sample variance is fixed as cr = 1 
throughout the numerical comparison, the results in Tables 1-3 are also 
applicable for any magnitude of variance as long as the ratio of mean 
difference and standard deviation d/o = 0.25, 0.50 and 1. 

Although the expected power approach has been considered in Kieser 
and Wassmer (1996) and Julious and Owen (2006), their results are 
insufficient and cumbersome. First, Kieser and Wassmer (1996) presented 
an approximation for the computation of n(o, dg) and did not address the 
notion about the search for a correction factor h so that the planned 
expected power is satisfied. On the other hand, Julious and Owen (2006) 
derived a formula for n(o, 'dp) and used the equation to compute the 
sample size needed to achieve the required expected power. Although the 
presented formula in Equation 9 of Julious and Owen (2006) appears to 
give similar results, their analytical derivation is complicated and is very 
different from the exposition presented in this article. In addition, no exact 
formulation and examination about the expected power of their 
approximation were provided. They suggested the derived formulas for 
accurate sample size calculations when the number of degrees of freedom of 
preliminary sample variance is less than 200. However, the values of 
expected power n(o, 'dp) and n(u, Tip) are nearly equivalent for o > 100 
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in Tables 1-3. Hence, it suggested that the correction of sample size is 
fruitful when degrees of freedom o < 100. On the other hand, the use of 
adjustment of sample variance in sample size determination is always 
recommended in view of the fact that the assurance probability T(o, '&£) is 
far less than ITu, ffg ) even for values of u as large as 500 in Tables 1-3. 

Numerical Example 

For illustration, the aforementioned two methods for sample size 
calculations for the two-sided two-sample /-test defined in Equation 2 are 
exemplified with balanced group sizes. Assume an estimate of sample 
variance = 100 with o = 50 degrees of freedom is available from a 
similar study. With the mean difference is d = 5 units, it follows from the 
suggested procedure that a sample size of N { = N 2 = 103 is required to 
guarantee a chance of 1 - y = 0.80 that the actual power exceeds 0.90 with a 
= 0.05. On the other hand, the minimum sample size of N x = N 2 = 90 is 
necessary to ensure the expected power is at least 1 - (3 = 0.90 with a = 
0.05. It is interesting to note that the approximate assurance probability with 
the smaller sample size of N 1 = N 2 = 90 rather than 103 turns out to be about 
0.5751. In contrast, the approximate expected power attained with the 
sample size N { = N 2 = 103 instead of 90 is as high as 0.9322. Accordingly, 
the two considerations of assurance probability and expected power are 
fundamentally distinct and yield markedly different sample sizes. The 
differential phenomenon should continue to exist for other settings. 
Furthermore, the sample size of iV, = N 2 = N, = 86 given by the conventional 
sample size formula using sample variance as the population variance 
provides an assurance probability of only P{d 2 > cr} = P(x 2 (50) > 50} = 
0.4734 that the actual power will attain a planned power. Moreover, the 
corresponding approximate expected power is 0.8858. The empirical 
assurance results help to exemplify the entrenched difference in sample size 
determination between a stochastic sample variance and the stationary 
population variance. The SAS/IML (SAS Institute, 2011) programs 
employed to perform the sample size calculations are available upon 
request. 


Concluding Remarks 

Sample size determination is an important step in study planning. This 
paper describes the methodology for determining the necessary sample size 
of two-sample /-tests with adjusted sample variance under assurance 
probability and expected power considerations. The presented methods 
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permit imprecision in the variance estimate and exploit the stochastic 
distribution of sample variance in the calculations. In contrast, a direct use 
of a sample variance from pilot study or previous research can lead to 
serious underestimation of the necessary sample size and distortion of the 
desired power in detecting treatment differences. Although the proposed 
techniques are described in the context of a two-sample problem regarding 
the equality of two group means, the principles and procedures apply to 
more complex settings such as ANOVA and linear regression models. 
These techniques should prove efficacious in conducting a priori power 
analysis and sample size calculation under circumstances of imprecise 
information based on the findings of published research and preliminary 
study. 
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